7 research outputs found

    Productive Development of Scalable Network Functions with NFork

    Full text link
    Despite decades of research, developing correct and scalable concurrent programs is still challenging. Network functions (NFs) are not an exception. This paper presents NFork, a system that helps NF domain experts to productively develop concurrent NFs by abstracting away concurrency from developers. The key scheme behind NFork's design is to exploit NF characteristics to overcome the limitations of prior work on concurrency programming. Developers write NFs as sequential programs, and during runtime, NFork performs transparent parallelization by processing packets in different cores. Exploiting NF characteristics, NFork leverages transactional memory and develops efficient concurrent data structures to achieve scalability and guarantee the absence of concurrency bugs. Since NFork manages concurrency, it further provides (i) a profiler that reveals the root causes of scalability bottlenecks inherent to the NF's semantics and (ii) actionable recipes for developers to mitigate these root causes by relaxing the NF's semantics. We show that NFs developed with NFork achieve competitive scalability with those in Cisco VPP [16], and NFork's profiler and recipes can effectively aid developers in optimizing NF scalability.Comment: 16 pages, 8 figure

    Scalable Range Locks for Scalable Address Spaces and Beyond

    Full text link
    Range locks are a synchronization construct designed to provide concurrent access to multiple threads (or processes) to disjoint parts of a shared resource. Originally conceived in the file system context, range locks are gaining increasing interest in the Linux kernel community seeking to alleviate bottlenecks in the virtual memory management subsystem. The existing implementation of range locks in the kernel, however, uses an internal spin lock to protect the underlying tree structure that keeps track of acquired and requested ranges. This spin lock becomes a point of contention on its own when the range lock is frequently acquired. Furthermore, where and exactly how specific (refined) ranges can be locked remains an open question. In this paper, we make two independent, but related contributions. First, we propose an alternative approach for building range locks based on linked lists. The lists are easy to maintain in a lock-less fashion, and in fact, our range locks do not use any internal locks in the common case. Second, we show how the range of the lock can be refined in the mprotect operation through a speculative mechanism. This refinement, in turn, allows concurrent execution of mprotect operations on non-overlapping memory regions. We implement our new algorithms and demonstrate their effectiveness in user-space and kernel-space, achieving up to 9Ɨ\times speedup compared to the stock version of the Linux kernel. Beyond the virtual memory management subsystem, we discuss other applications of range locks in parallel software. As a concrete example, we show how range locks can be used to facilitate the design of scalable concurrent data structures, such as skip lists.Comment: 17 pages, 9 figures, Eurosys 202

    Scaling synchronization primitives

    Get PDF
    Over the past decade, multicore machines have become the norm. A single machine is capable of having thousands of hardware threads or cores. Even cloud providers offer such large multicore machines for data processing engines and databases. Thus, a fundamental question arises is how efficient are existing synchronization primitivesā€” timestamping and lockingā€”that developers use for designing concurrent, scalable, and performant applications. This dissertation focuses on understanding the scalability aspect of these primitives, and presents new algorithms and approaches, that either leverage the hardware or the application domain knowledge, to scale up to hundreds of cores. First, the thesis presents Ordo , a scalable ordering or timestamping primitive, that forms the basis of designing scalable timestamp-based concurrency control mechanisms. Ordo relies on invariant hardware clocks and provides a notion of a globally synchronized clock within a machine. We use the Ordo primitive to redesign a synchronization mechanism and concurrency control mechanisms in databases and software transactional memory. Later, this thesis focuses on the scalability aspect of locks in both virtualized and non-virtualized scenarios. In a virtualized environment, we identify that these locks suffer from various preemption issues due to a semantic gap between the hypervisor shceduler and a virtual machine schedulerā€”the double scheduling problem. We address this problem by bridging this gap, in which both the hypervisor and virtual machines share minimal scheduling information to avoid the preemption problems. Finally, we focus on the design of lock algorithms in general. We find that locks in practice have discrepancies from locks in design. For example, popular spinlocks suffer from excessive cache-line bouncing in multicore (NUMA) systems, while state-of-the-art locks exhibit sub-par single-thread performance. We classify several dominating factors that impact the performance of lock algorithms. We then propose a new technique, shuffling, that can dynamically accommodate all these factors, without slowing down the critical path of the lock. The key idea of shuffling is to re-order the queue of threads waiting to acquire the lock with some pre-established policy. Using shuffling, we propose a family of locking algorithms, called SHFLLOCKS that respect all factors, efficiently utilize waiters, and achieve the best performance.Ph.D

    Fuzzing file systems via two-dimensional input space exploration

    No full text
    File systems, a basic building block of an OS, are too big and too complex to be bug free. Nevertheless, file systems rely on regular stress-testing tools and formal checkers to find bugs, which are limited due to the ever-increasing complexity of both file systems and OSes. Thus, fuzzing, proven to be an effective and a practical approach, becomes a preferable choice, as it does not need much knowledge about a target. However, three main challenges exist in fuzzing file systems: mutating a large image blob that degrades overall performance, generating image-dependent file operations, and reproducing found bugs, which is difficult for existing OS fuzzers. Hence, we present JANUS, the first feedback-driven fuzzer that explores the two-dimensional input space of a file system, i.e., mutating metadata on a large image, while emitting image-directed file operations. In addition, JANUS relies on a library OS rather than on traditional VMs for fuzzing, which enables JANUS to load a fresh copy of the OS, thereby leading to better reproducibility of bugs. We evaluate JANUS on eight file systems and found 90 bugs in the upstream Linux kernel, 62 of which have been acknowledged. Forty-three bugs have been fixed with 32 CVEs assigned. In addition, JANUS achieves higher code coverage on all the file systems after fuzzing 12 hours, when compared with the state-of-the-art fuzzer Syzkaller for fuzzing file systems. JANUS visits 4.19x and 2.01x more code paths in Btrfs and ext4, respectively. Moreover, JANUS is able to reproduce 88-100% of the crashes, while Syzkaller fails on all of them
    corecore